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1 Introduction 

This paper argues that an interlingual representation 
must explicitly represent some parts of the meaning of 
a situation as possibilities (or preferences), not as neces- 
sary or definite components of meaning (or constraints). 
Possibilities enable the analysis and generation of nu- 
ance, something required for faithful translation. Fur- 
thermore, the representation of the meaning of words is 
crucial, because it specifies which nuances words can 
convey in which contexts. 

In translation it is rare to find the exact word that 
faithfully and directly translates a word of another lan- 
guage. Often, the target language will provide many 
near-synonyms for a source language word that differ 
(from the target word and among themselves) in nuances 
of meaning. For example, the French fournir could be 
translated as provide, supply, furnish, offer, volunteer, af- 
ford, bring, and so on, which differ in fine-grained as- 
pects of denotation, emphasis, and style. (Figures 1 and 2 
show some of the distinctions.) But none of these options 
may carry the right nuances to match those conveyed by 
fournir in the source text; unwanted extra nuances may 
be conveyed, or a desired nuance may be left out. Since 
an exact match is probably impossible in many situa- 
tions, faithful translation will require uncovering the nu- 
ances conveyed by a source word and then determining 
how the nuances can be conveyed in the target language 
by appropriate word choices in any particular context. 
The inevitable mismatches that occur are one type of 
translation mismatch — differences of meaning, but not 
of form, in the source and target language (Kameyama 
etal., 1991). 1 



1 A separate class of difference, translation divergence, involves dif- 
ferences in the form of the source and target texts and results from 
lexical gaps in the target language (in which no single word lexical- 
izes the meaning of a source word), and from syntactic and colloca- 
tional constraints imposed by the source language. 'Paraphrasing' the 
source text in the target language is required in order to preserve the 
meaning as much as possible (Dorr, 1994; Stede, 1996; Elhadad et al., 
1997). But even when paraphrasing, choices between near-synonyms 
will have to be made, so, clearly, translation mismatches and translation 
divergences are not independent phenomena. Just as standard semantic 
content can be incorporated or spread around in different ways, so can 
nuances of meaning. 



Provide may suggest foresight and stress the idea of making 
adequate preparation for something by stocking or ship- 
ping . . . 

Supply may stress the idea of replacing, of making up what is 
needed, or of satisfying a deficiency. 

Furnish may emphasize the idea of fitting something or some- 
one with whatever is necessary, or sometimes, normal or 
desirable. 

Figure 1: An abridged entry from Webster's New Dictio- 
nary of Synonyms (Gove, 1973). 



Offer and volunteer may both refer to a generous extending 
of aid, services, or a desired item. Those who volunteer 
agree by free choice rather than by submission to selec- 
tion or command. 

Figure 2: An abridged entry from Choose the Right 
Word (Hayakawa, 1994). 



2 Near-synonyms across languages 

This section examines how near-synonyms can differ 
within and across languages. I will discuss some of the 
specific problems of lexical representation in an interlin- 
gual MT system using examples drawn from the French 
and English versions of the multi-lingual text provided 
for this workshop. 

To be as objective as possible, I'll rely on several 
dictionaries of synonym discrimination including, for 
English, Gove (1973) and Hayakawa (1994), and for 
French, Bailly (1970), Benac (1956), and Batchelor and 
Offord (1993). Unless otherwise stated, the information 
on differences below comes from one of these reference 
books. 

Notation: Below, 'englishr.french' indicates that the 
pair of words or expressions english and french corre- 
spond to one another in the multi-lingual text (i.e., they 
are apparent translations of each other). 

Fine-grained denotational mismatches 

If a word has near-synonyms, then they most likely 
differ in fine-grained aspects of denotation. Consider the 
following pairs: 



1 . provides : : fournit 

2. provided :: apportaient 

3. provide :: offrir 

4. brought:: fournissait 

5. brought :: se chargeait 

These all share the basic meaning of giving or mak- 
ing available what is needed by another, but each adds 
its own nuances. And these are not the only words 
that the translator could have used: in English, furnish, 
supply, offer, and volunteer would have been possibil- 
ities; in French, approvisionner, munir, pourvoir, nan- 
tir, presenter, among others, could have been chosen. 
The differences are complex and often language-specific. 
Figures 1 and 2 discuss some of the differences between 
the English words, and figures 3 and 4 those between 
the French words. And this is the problem for transla- 
tion: none of the words match up exactly, and the nu- 
ances they carry when they are actually used are context- 
dependent. (Also notice that the usage notes are vague in 
many cases, using words like 'may' and 'idee'.) 
Consider this second example: 

6a. began:: amorce 

b. began : : commenga 

c. started :: au debut 

Amorcer implies a beginning that prepares for something 
else; there is no English word that carries the same nu- 
ance, but begin appears to be the closest match. Com- 
mencer also translates as begin, although commencer is 
a general word in French, implying only that the thing 
begun has a duration. In English, begin differs from start 
in that the latter can imply a setting out from a certain 
point after inaction (in opposition to stop). 

More pairings that exhibit similar fine-grained denota- 
tional differences include these: 



7a. 


broaden :: elargir 


b. 


expand:: etendre 


c. 


increase :: accroitre 


8a. 


transformation :: passer 


b. 


transition :: transition 


9. 


enable :: permettre 


10. 


opportunities :: perspectives 


11. 


assistance :: assistance 



There are two main problems in representing the 
meanings of these words. First, although some of the 
nuances could be represented by simple features, such as 
'foresight' or 'generous', most of them cannot because 
they are complex and have an 'internal' structure. They 
are concepts that relate aspects of the situation. For ex- 
ample, for furnish, 'fitting someone with what is nec- 
essary' is not a simple feature; it involves a concept of 



Fourni a rapport a la quantite et ce dit de ce qui a sufnsamment 

ou en abondance le necessaire. 
Muni et arme sont relatifs a l'etat d'une chose rendue forte ou 

capable, muni, plus generate, annoncant un secours pour 

faire quoi que ce soit. 
Pourvu comporte un idee de precaution et ce dit bien en par- 

lant des avantages naturels donnes par une sorte de nnalite 

Nanti, muni d'un gage donne par un debiteur a son creancier, 
par ext. muni par precaution et, absolumment, assez en- 
richi pour ne pas craindre l'avenir. 

Figure 3: An abridged entry from Benac (1956). 



Offrir, c'est faire hommage d'une chose a quelqu'un, en man- 
ifestant le desir qu'il l'accepte, arm que l'offre devienne 
un don. 

Presenter, c'est offrir une chose que Ton tient a la main ou 
qui est la sous les yeux et dont la personne peut a l'instant 
prendre possession. 

Figure 4: An abridged entry from Bailly (1970). 



'fitting', a patient (the same patient that the overall sit- 
uation has), a thing that is provided, and the idea of 
the necessity of that thing to someone. Thus, many nu- 
ances must be represented as fully-Hedged concepts (or 
instances thereof) in an interlingua. 

Second, many of the nuances are merely suggested or 
implied, if they are conveyed at all. That is, they are 
conveyed indirectly — the reader has the license to decide 
that such a nuance was unintended — and as such are not 
necessary conditions for the definition of the words. This 
has ramifications for both the analysis of the source text 
and the generation of the target text because one has to 
determine how strongly a certain nuance is intended, if at 
all (in the source), and then how it should be conveyed, 
if it can be, in the target language. One should seek to 
translate indirect expressions as such, and avoid making 
them direct. One must also avoid choosing a target word 
that might convey an unwanted implication. In any case, 
aspects of word meaning that are indirect must be repre- 
sented as such in the lexicon. 

Coarse-grained denotational mismatches 

Sometimes the translator chooses a target word that is se- 
mantically quite different from the source word, yet still 
conveys the same basic idea. Considering pair 5, above: 
bring seems to mean to carry as a contribution, and se 
charger to take responsibility for. Perhaps there are 
no good equivalents in the opposite languages for these 
terms, or alternatively, the words might have been cho- 
sen because of syntactic or collocational preferences — 
they co-occur with leadership :: V administration, which 
are not close translations either. 

In fact, the desire to use natural-sounding syntactic 



and collocational structures is probably responsible for 
many of these divergences. In another case, the pair fac- 
tors :: raisons occurs perhaps because the translator did 
not want to literally translate the expressions Many fac- 
tors contributed tov.Parmi les raisons de. Such mis- 
matches are outside the scope of this paper, because they 
fall more into the area of translation divergences. (See 
Smadja et al. (1996) for research on translating colloca- 
tions.) 

Stylistic mismatches 

Words can also differ on many stylistic dimensions, but 
formality is the most recognized dimension. 2 Consider 
the following pairs: 

12a. plans :: entend bien 
b. plan :: envisagent de 

While the French words differ in formality (entend bien 
is formal, and envisagent de is neutral), the same word 
was chosen in English. Note that the other French words 
that could have been chosen also differ in formality: se 
proposent de has intermediate formality, and comptent, 
avont 1' intention, and projetent de are all neutral. 

Similarly, in 6, above, amorcer is more formal than 
commencer. Considering the other near-synonyms: the 
English commence and initiate are quite formal, as is the 
French initier. Debuter and demarrer are informal, yet 
both are usually translated by begin, a neutral word in 
English. (Notice also that the French cognate of the for- 
mal English commence, commencer, is neutral.) 

Style, which can be conveyed by both the words and 
the structure of a text, is best represented as a global 
property in an interlingual representation. That way, it 
can influence all decisions that are made. (It is probably 
not always necessary to preserve the style of particular 
words across languages.) 

A separate issue of style in this text is its use of techni- 
cal or domain-specific vocabulary. Consider the follow- 
ing terms used to refer to the subject of the text: 

13a. institution :: institution 

b. institution :: etablissement 

c. institution :: association 

d. joint venture :: association 

e. programme :: association 

f. bank :: etablissement 

g. bank :: banque 

In French, it appears that association must be used to re- 
fer to non-profit companies and etablissement or banque 
for their regulated (for-profit) counterparts. In English 
institution, among other terms, is used for both. Con- 
sider also the following pairs: 

2 Hovy (1988) suggests others including force and floridity, and Di- 
Marco et al. (1993) suggest concreteness or vividness. Actually, it 
seems that the French text is more vivid — if a text on banking can be 
considered vivid at all — than the English, using words such as baptisee, 
eclatant, contagieux, and demunis. 



14a. seed capital : : capital initial 

b. working capital v.fonds de roulement 

c. equity capital :: capital social 

Attitudinal mismatches 

Words also differ in the attitude that they express. For ex- 
ample, of poor:: demunis, poor can express a derogatory 
attitude, but demunis (which can be translated as impov- 
erished) probably expresses a neutral attitude. Consider 
also people of indigenous background:: Indiens. Atti- 
tudes must be included in the interlingual representation 
of an expression, and they must refer to the specific par- 
ticipants) about whom the speaker is expressing an atti- 
tude. 

3 Representing near-synonyms 

Before I discuss the requirements of the interlingual rep- 
resentation, I must first discuss how the knowledge of 
near-synonyms ought to be modelled if we are to account 
for the complexities of word meaning in an interlingua. 
In the view taken here, the lexicon is given the central 
role as bridge between natural language and interlingua. 

The conventional model of lexical knowledge, used 
in many computational systems, is not suitable for rep- 
resenting the fine-grained distinctions between near- 
synonyms (Hirst, 1995). In the conventional model, 
knowledge of the world is represented by ostensibly 
language-neutral concepts that are often organized as an 
ontology. The denotation of a lexical item is represented 
as a concept, or a configuration of concepts, and amounts 
to a direct word-to-concept link. So except for polysemy 
and (absolute) synonymy, there is no logical difference 
between a lexical item and a concept. Therefore, words 
that are nearly synonymous have to be linked each to 
their own slightly different concepts. The problem comes 
in trying to represent these slightly different concepts and 
the relationships between them. Hirst (1995) shows that 
one ends up with an awkward proliferation of language- 
dependent concepts, contrary to the interlingual function 
of the ontology. And this assumes we can even build a 
representative taxonomy from a set of near-synonyms to 
begin with. 

Moreover, the denotation of a word is taken to em- 
body the necessary and sufficient conditions for defining 
the word. While this has been convenient for text anal- 
ysis and lexical choice, since a denotation can be used 
as an applicability condition of the word, the model is 
inadequate for representing the nuances of meaning that 
are conveyed indirectly, which, clearly, are not necessary 
conditions. 

An alternative representation is suggested by the prin- 
ciple behind Gove's (1973) synonym usage notes. Words 
are grouped into a entry if they have the same essential 
meaning, i.e., that they "can be defined in the same terms 
up to a certain point" (p. 25a) and differ only in terms of 
minor ideas involved in their meanings. We combine this 
principle with Saussure's paradigmatic view that "each 




Figure 5: The clustered model of lexical knowledge. 



of a set of synonyms . . . has its particular value only be- 
cause they stand in contrast with one another" (Saussure, 
1983, p. 1 14) and envision a representation in which the 
meaning of a word arises out of a combination of its es- 
sential denotation (shared with other words) and a set of 
explicit differences to its near-synonyms. 

Thus, I propose a clustered model of lexical knowl- 
edge, depicted in figure 5. A cluster has two levels of 
representation: a core concept and peripheral concepts. 
The core concept is a denotation as in the conventional 
model — a configuration of concepts (that are defined in 
the ontology) that functions as a necessary applicabil- 
ity condition (for choice) — but it is shared by the near- 
synonyms in the cluster. In the figure, the ontological 
concepts are shown as rectangles; in this case all three 
clusters denote the concept of MAKING- AVAILABLE. All 
of the peripheral concepts that the words may differ 
in denoting, suggesting, or emphasizing are also repre- 
sented as configurations of concepts, but they are explic- 
itly distinguished from the core concept as indirect mean- 
ings that can be conveyed or not depending on the con- 
text. In the figure, the differences between words (in a 
single language) are shown as dashed lines; not all words 
need be differentiated. Stylistic, attitudinal, and colloca- 
tional factors are also encoded in the cluster. 



Each language has its own set of clusters. Corre- 
sponding clusters (across languages) need not have the 
same peripheral concepts since languages may differen- 
tiate their synonyms in entirely different terms. Differ- 
ences across languages are represented, for convenience, 
by dashed lines between clusters, though these would not 
be used in pure interlingual MT. Essentially, a cluster is 
a language-specific/ormaZ usage note, an idea originated 
by DiMarco et al. (1993) that Edmonds (forthcoming) is 
formalizing. 

4 Interlingual representation 

Crucially, an interlingual representation should not be 
tied to any particular linguistic structure, whether lexi- 
cal or syntactic. 

Assuming that one has constructed an ontology or do- 
main model (of language-neutral concepts), an interlin- 
gual representation of a situation is, for us, an instan- 
tiation of part of the domain knowledge. Both Stede 
(1996) and Elhadad et al. (1997) have developed such 
formalisms for representing the input to natural language 
generation applications (the former to multilingual gen- 
eration), but they are applicable to interlingual MT as 
well. The formalisms allow their applications to para- 
phrase the same input in many ways including realiz- 



ing information at different syntactic ranks and cover- 
ing/incorporating the input in different ways. For them, 
generation is a matter of satisfying two types of con- 
straints: (1) covering the whole input structure with a set 
of word denotations (thereby choosing the words), and 
(2) building a well-formed syntactic structure out of the 
words. But while their systems can provide many options 
to choose from, they lack the complementary ability to 
actually choose which is the most appropriate. 

Now, finding the most appropriate translation of a 
word involves a tradeoff between many possibly conflict- 
ing desires to express certain nuances in certain ways, 
to establish the right style, to observe collocational pref- 
erences, and to satisfy syntactic constraints. This sug- 
gests that lexical choice is not a matter of satisfying con- 
straints (i.e., of using the necessary applicability condi- 
tions of a word), but rather of attempting to meet a large 
set of preferences. Thus, a distinction must be made be- 
tween knowledge that should be treated as preferences as 
opposed to constraints in the interlingual representation. 
In the generation stage of MT, one attempts to choose 
the near-synonym from a cluster (activated because of 
the constraints) whose peripheral concepts best meet the 
most preferences. 

Turning to the analysis stage of MT, since many nu- 
ances are expressed indirectly and are influenced by the 
context, one cannot know for sure whether they have 
been expressed unless one performs a very thorough 
analysis. Indeed, it might not be possible for even a thor- 
ough analysis to decide whether a nuance was expressed, 
or how indirectly it was expressed, given the context- 
dependent nature of word meaning. Thus, on the basis of 
the knowledge of what words can express, stored in the 
clusters, the analysis stage would output an interlingual 
representation that includes possibilities of what was ex- 
pressed. The possibilities then become preferences dur- 
ing generation. 

5 Examples 

Figures 6-9 give examples of interlingual representations 
for four segments of the text that involve some of the 
words discussed in section 2. Since my focus is on word 
meanings, I will not give complete representations of the 
expressions. Also note that while I use specific ontologi- 
cal concepts in these descriptions, this in no way implies 
that I claim these are the right concepts to represent — in 
fact, some are quite crude. A good ontology is crucial 
to MT, and I assume that such an ontology will in due 
course be constructed. 

I have used attribute-value structures, but any equiv- 
alent formalism would do. Square brackets enclose re- 
cursive structures of instantiations of ontological con- 
cepts. Names of instances are in lowercase; concepts 
are capitalized; relations between instances are in up- 
percase; and cross-reference is indicated by a digit in 
a square. A whole interlingual representation is sur- 
rounded by brace brackets and consists of exactly one 



specification of the situation and any number of possi- 
bilities, attitudes, and stylistic preferences. The 'situa- 
tion' encodes the information one might find in a tradi- 
tional interlingual representation — the definite portion of 
meaning to be expressed. A 'possibility' takes as a value 
a four-part structure of (1) frequency (never, sometimes, 
or always), which represents the degree of possibility; 
(2) strength (weak, medium, or strong), which represents 
how strongly the nuance is conveyed; (3) type (emphasis, 
suggestion, implication, or denotation), which represents 
how the nuance is conveyed; and (4) an instance of a con- 
cept. The 'style' and 'attitude' attributes should be self- 
explanatory. As for content, some of the meanings were 
discussed in section 2, and the rest are derived from the 
aforementioned dictionaries. Comments (labelled with 
'%') are included to indicate which words gave rise to 
which possibilities. 

6 Conclusion 

This paper has motivated the need to represent possibili- 
ties (or preferences) in addition to necessary components 
(or constraints) in the interlingual representation of a sit- 
uation. Possibilities are required because words can con- 
vey a myriad of sometimes indirect nuances of meaning 
depending on the context. Some examples of how one 
could represent possibilities were given. 
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situation [JJ 



begin 

instance-of Beginning 



OBJECT 



transition 

instance-of StateChange 



TIME 



year- 1989 
instance-of Year 



possibility 



( type implication 
prepare2 



concept 



V 



instance-of Preparing 
AGENT \B 



%from 'amorcee' 



style ( formality ^level high^j 



"The transition . . . began in 1989." 
"La transition, amorcee en 1989 



Figure 8: Interlingual representation with a stylistic preference (for high formality). 



situation [JJ 



workers 

instance-of Worker 



ATTRIBUTE 



attitude 



ATTRIBUTE 
type neutral \ 



poor 

instance-of Poor 



DEGREE 



high 



self-employed 

instance-of EmploymentStatus 



of m 



"the very poor self-employed" 
"travailleurs independents les plus d'emunis" 



Figure 9: Interlingual representation with an expressed attitude. 



